What are Outliers? 

• Outlier -> a data object that deviates significantly from the rest of the objects 

UJ Ex: a student with exceptionally high grades 

• Normal versus anomalous data objects -> how to define normalcy? 

• Outliers versus noise -^randomness, repetition patterns 



Global 



• Deviate 
significantly from 
the rest of the 
dataset 

• Ex: Intrusion 
Detection 

• How to measure 
deviation? 



Contextual 



• Deviate with 
respect to context 
(time, location) 

• Ex: Temperature 
values 

• Contextual 
attributes used to 
evaluate context 

• Behavioral 
attributes used to 
evaluate outlier 



Collective 



• A subset of objects 
collectively 
deviates 

significantly from 
the dataset 
Ex: Multiple order 
delays, DoS attacks 
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Challenges for Outlier Detection 











Modeling normal objects & outliers 








* Normal models are challenging to build 

» Distinction between normalcy and anomaly is amb 


iguous 










Application- specific outlier detection 








• How much deviation is considered an anomaly /ou1 


tlier? 










Handling noise in outlier detection 




I 




* How to detect outliers in the presence of noise? 

* Nois e som e tim es " h id es " utli ers ! 







Outliers Detection Methods 
Statistical Methods 

i Parametric 

o_xS Jjl9 v^sJUI ol^-bbtjoJI v^ /xfcLjujp J9^9 boxplots , x 2 o+^.j^-f outlier :>jc%j ^jlcj 

J For univariate outliers -> boxplots -> parameters are mean and IQR 

& For multivariate outliers -> x 2 -stat7s*7c -> parameter is mean 

• Nonparametric -> learn normal model from input data 

histogram ^.jio { j^ kS±$ output data v^sJLc Ji^^Jjuu Lul 9I 

& histograms 

A transaction with the amount of $7500 is considered an outlier 

Does not belong to any of the bins (0.2% of transactions > $5000) 

outlier L&^xiajLfc frequency JIfI^JLo <^jou3 v^s^ 9J UIjJI sj^uul^j 
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Proximity-based Methods 

• Distance-based -> for an object o, examine the number of other objects in its r- 
neighborhood 

** r is a distance threshold 

• n is a fraction threshold -> min # objects needed in neighbourhood 

,jjlS2x> ^JlC Lfcl9J> O9SU />j\J Ld^l9J> v-sJJI -b-Q^JI ^Jlc ^9^19 ^Jul^x) o^JL ,^9 c lkiiij J5 dJLuuuoLfc 
^jjuuo /x^jl J99I uLuJlC [jA&j gjo ^9^>9jo ^JlC J3l v-Svj^2j) OUtlier ^jojuo /x^jl JO^JlC J99I uLuulC 

(outlier 

• Density-based -> for an object o, examine its density relative to the density of its 
local neighborsA local outlier factor (LOF) is computed in terms of the K-NN of an 
object in comparison to its neighbours 

KNN ^j.jio ,>£ c l H uul>j9 local outlier factor \-kuujjzj$ M ^9 outlier UIjJI JS ulS N <J9_jJjIi3> 
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Clustering-based Methods 
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■jjuLul jjLjuUlLSJ I t- , ^AA_PJ i/ t C JulSjUQ yjS?j£> jJLajjJLSJ 

Classification-based Methods 

■ Use training data to build a normal model ( one class ) 

Any samples that do not belong to normal class are outliers 
outlier ^sn^ o± U ju\KJI &J Jixjq l£x9\Lfc bh lsI ujj^ Sr 50^l uLjjlc training data /> 
• Learn the decision boundaries of the normal class 

& Using SVMs for example (this was not discussed in class) 
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Mining Contextual and Collective Outliers 

* Contextual outliers: 

1. Determine contextual attributes 

contextual v^ L^joA^-juuLfc v^sOUl attribute :>joI 

2. Group those attributes' values 

attribute pu3 &x>j>I 

3. Determine context of object (its group) 

4. Do conventional outlier detection within that group 

V9>> J^ J>l^ outlier ^Jol 

5. Ex -> customers with similar age who live within same area may have 
similar behaviour -> a customer in that age group and living in that area 
with a different behaviour is an outlier 

• Collective outliers -> challenging and advanced area 



